Goto

Collaborating Authors

 syllable count


MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation

Cho, Woohyun, Kim, Youngmin, Lee, Sunghyun, Yu, Youngjae

arXiv.org Artificial Intelligence

Lyrics translation requires both accurate semantic transfer and preservation of musical rhythm, syllabic structure, and poetic style. In animated musicals, the challenge intensifies due to alignment with visual and auditory cues. We introduce Multilingual Audio-Video Lyrics Benchmark for Animated Song Translation (MAVL), the first multilingual, multimodal benchmark for singable lyrics translation. By integrating text, audio, and video, MAVL enables richer and more expressive translations than text-only approaches. Building on this, we propose Syllable-Constrained Audio-Video LLM with Chain-of-Thought SylAVL-CoT, which leverages audio-video cues and enforces syllabic constraints to produce natural-sounding lyrics. Experimental results demonstrate that SylAVL-CoT significantly outperforms text-based models in singability and contextual accuracy, emphasizing the value of multimodal, multilingual approaches for lyrics translation.


Song Form-aware Full-Song Text-to-Lyrics Generation with Multi-Level Granularity Syllable Count Control

Chae, Yunkee, Shin, Eunsik, Suntae, Hwang, Paik, Seungryeol, Lee, Kyogu

arXiv.org Artificial Intelligence

Lyrics generation presents unique challenges, particularly in achieving precise syllable control while adhering to song form structures such as verses and choruses. Conventional line-by-line approaches often lead to unnatural phrasing, underscoring the need for more granular syllable management. We propose a framework for lyrics generation that enables multi-level syllable control at the word, phrase, line, and paragraph levels, aware of song form. Our approach generates complete lyrics conditioned on input text and song form, ensuring alignment with specified syllable constraints. Generated lyrics samples are available at: https://tinyurl.com/lyrics9999


Design and Implementation of a Tool for Extracting Uzbek Syllables

Salaev, Ulugbek, Kuriyozov, Elmurod, Matlatipov, Gayrat

arXiv.org Artificial Intelligence

The accurate syllabification of words plays a vital role in various Natural Language Processing applications. Syllabification is a versatile linguistic tool with applications in linguistic research, language technology, education, and various fields where understanding and processing language is essential. In this paper, we present a comprehensive approach to syllabification for the Uzbek language, including rule-based techniques and machine learning algorithms. Our rule-based approach utilizes advanced methods for dividing words into syllables, generating hyphenations for line breaks and count of syllables. Additionally, we collected a dataset for evaluating and training using machine learning algorithms comprising word-syllable mappings, hyphenations, and syllable counts to predict syllable counts as well as for the evaluation of the proposed model. Our results demonstrate the effectiveness and efficiency of both approaches in achieving accurate syllabification. The results of our experiments show that both approaches achieved a high level of accuracy, exceeding 99%. This study provides valuable insights and recommendations for future research on syllabification and related areas in not only the Uzbek language itself, but also in other closely-related Turkic languages with low-resource factor.


A Computational Evaluation Framework for Singable Lyric Translation

Kim, Haven, Watanabe, Kento, Goto, Masataka, Nam, Juhan

arXiv.org Artificial Intelligence

Lyric translation plays a pivotal role in amplifying the global resonance of music, bridging cultural divides, and fostering universal connections. Translating lyrics, unlike conventional translation tasks, requires a delicate balance between singability and semantics. In this paper, we present a computational framework for the quantitative evaluation of singable lyric translation, which seamlessly integrates musical, linguistic, and cultural dimensions of lyrics. Our comprehensive framework consists of four metrics that measure syllable count distance, phoneme repetition similarity, musical structure distance, and semantic similarity. To substantiate the efficacy of our framework, we collected a singable lyrics dataset, which precisely aligns English, Japanese, and Korean lyrics on a line-by-line and section-by-section basis, and conducted a comparative analysis between singable and non-singable lyrics. Our multidisciplinary approach provides insights into the key components that underlie the art of lyric translation and establishes a solid groundwork for the future of computational lyric translation assessment.


Unsupervised Melody-Guided Lyrics Generation

Tian, Yufei, Narayan-Chen, Anjali, Oraby, Shereen, Cervone, Alessandra, Sigurdsson, Gunnar, Tao, Chenyang, Zhao, Wenbo, Chung, Tagyoung, Huang, Jing, Peng, Nanyun

arXiv.org Artificial Intelligence

Automatic song writing is a topic of significant practical interest. However, its research is largely hindered by the lack of training data due to copyright concerns and challenged by its creative nature. Most noticeably, prior works often fall short of modeling the cross-modal correlation between melody and lyrics due to limited parallel data, hence generating lyrics that are less singable. Existing works also lack effective mechanisms for content control, a much desired feature for democratizing song creation for people with limited music background. In this work, we propose to generate pleasantly listenable lyrics without training on melody-lyric aligned data. Instead, we design a hierarchical lyric generation framework that disentangles training (based purely on text) from inference (melody-guided text generation). At inference time, we leverage the crucial alignments between melody and lyrics and compile the given melody into constraints to guide the generation process. Evaluation results show that our model can generate high-quality lyrics that are more singable, intelligible, coherent, and in rhyme than strong baselines including those supervised on parallel data.


Autonomous Haiku Generation

Aguiar, Rui, Liao, Kevin

arXiv.org Artificial Intelligence

Artificial Intelligence is an excellent tool to improve efficiency and lower cost in many quantitative real world applications, but what if the task is not easily defined? What if the task is generating creativity? Poetry is a creative endeavor that is highly difficult to both grasp and achieve with any level of competence. As Rita Dove, a famous American poet and author states, "Poetry is language at its most distilled and most powerful." Taking Doves quote as an inspiration, our task was to generate high quality haikus using artificial intelligence and deep learning.